Search results for "Text compression"

showing 7 items of 7 documents

On parsing optimality for dictionary-based text compression—the Zip case

2013

Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results …

Theoretical computer scienceComputer scienceData_CODINGANDINFORMATIONTHEORYTop-down parsingcomputer.software_genreTheoretical Computer ScienceParsing optimalityCompression (functional analysis)Discrete Mathematics and CombinatoricsLossless compressionParsingLZ77 algorithmSettore INF/01 - InformaticaDeflate algorithmbusiness.industryDictionary-based text compressionComputational Theory and MathematicsData compressionDEFLATECompression ratioArtificial intelligencebusinesscomputerNatural language processingBottom-up parsingData compressionJournal of Discrete Algorithms
researchProduct

The rightmost equal-cost position problem.

2013

LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor whose cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for…

FOS: Computer and information sciencesOffset (computer science)Computer scienceSuffix treeComputer Science - Information Theorylaw.inventionCombinatoricslawLog-log plotComputer Science - Data Structures and AlgorithmsCompression schemetext compressiondictionary text compressionData Structures and Algorithms (cs.DS)LZ77 compressiondata compressionLossless compressionfull text indexSuffix Tree Data StructuresSettore INF/01 - InformaticaInformation Theory (cs.IT)Data structurePrefixCompression ratioCompression scheme; Constant time; Suffix Tree Data StructuresAlgorithmData compressionConstant time
researchProduct

Dictionary-symbolwise flexible parsing

2012

AbstractLinear-time optimal parsing algorithms are rare in the dictionary-based branch of the data compression theory. A recent result is the Flexible Parsing algorithm of Matias and Sahinalp (1999) that works when the dictionary is prefix closed and the encoding of dictionary pointers has a constant cost. We present the Dictionary-Symbolwise Flexible Parsing algorithm that is optimal for prefix-closed dictionaries and any symbolwise compressor under some natural hypothesis. In the case of LZ78-like algorithms with variable costs and any, linear as usual, symbolwise compressor we show how to implement our parsing algorithm in linear time. In the case of LZ77-like dictionaries and any symbol…

Theoretical computer scienceComputer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologycomputer.software_genre01 natural sciencesDirected acyclic graphTheoretical Computer ScienceConstant (computer programming)020204 information systemsEncoding (memory)Optimal parsing0202 electrical engineering electronic engineering information engineeringDiscrete Mathematics and CombinatoricsStringologySymbolwise text compressionTime complexityLossless compressionParsingSettore INF/01 - InformaticaDictionary-based compressionOptimal Parsing Lossless Data Compression DAGDirected acyclic graphPrefixComputational Theory and MathematicsText compression010201 computation theory & mathematicsAlgorithmcomputerBottom-up parsingData compressionJournal of Discrete Algorithms
researchProduct

A trie-based approach for compacting automata

2004

International audience; We describe a new technique for reducing the number of nodes and symbols in automata based on tries. The technique stems from some results on anti-dictionaries for data compression and does not need to retain the input string, differently from other methods based on compact automata. The net effect is that of obtaining a lighter automaton than the directed acyclic word graph (DAWG) of Blumer et al., as it uses less nodes, still with arcs labeled by single characters.

automataComputer scienceSuffix tree[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]suffix tree0102 computer and information sciences02 engineering and technologyω-automaton01 natural sciencesindex text compressionlaw.inventionlawfactor and suffixTrie0202 electrical engineering electronic engineering information engineeringAutomata and formal languagesPattern matchingDirected acyclic word graphString (computer science)Directed graphDirected acyclic graphMobile automatonAutomaton010201 computation theory & mathematics020201 artificial intelligence & image processingAlgorithmComputer Science::Formal Languages and Automata Theory
researchProduct

Boosting Textual Compression in Optimal Linear Time

2005

We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. It displays the following remarkable properties: (a) it can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts; (b) it is very simple and optimal in terms of time; and (c) it admits a decompression algorithm again optimal in time. To the best of our knowledge, this is the first boosting technique displaying these properties.Technically, our boosting technique builds upon three main ingredients: the Burrows--Wheeler Transform, the Suffix Tree d…

Theoretical computer scienceBurrows–Wheeler transformSuffix treeString (computer science)Data_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformSubstringArithmetic codinglaw.inventionLempel-Ziv compressorsArtificial IntelligenceHardware and ArchitectureControl and Systems Engineeringlawtext compressionempirical entropyArithmetic codingGreedy algorithmTime complexityAlgorithmSoftwareInformation SystemsMathematicsData compression
researchProduct

Optimal Parsing for Dictionary-Based Compression

2013

Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purposes, it makes sense to find one of the possible parsing that minimise the final compression ratio. This is the parsing problem. In more than 30 years of history of dictionary-based text compression only few optimal parsing algorithms were presented. Most of the practical dictionary-based compression solutions need or prefer to factorise the input data into a sequence of dictionary-phrases and symbols. Those two output categories are usually encoded via two different encoders producing …

Settore INF/01 - Informaticaoptimal parsingtext compressiondata compressionLZ-compression
researchProduct

Optimal Parsing for Dictionary Text Compression

2012

Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimize the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weights on…

Text Compression;Settore INF/01 - InformaticaText Compression
researchProduct